Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: fix the resolved txn status cache for pessimistic txn #21689

Merged
merged 4 commits into from
Dec 14, 2020

Conversation

cfzjywxk
Copy link
Contributor

@cfzjywxk cfzjywxk commented Dec 13, 2020

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

  • Solve the incorrect transaction status cache for resolve pessimistic primary key lock.

What is changed and how it works?

What's Changed:

  • Only save resolved results for prewrite type locks.

How it Works:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

Side effects

  • Possible regression on performance for workloads with many conflicts.

Release note

  • Fix the resolved txn status cache for pessimistic transactions.

@sre-bot
Copy link
Contributor

sre-bot commented Dec 13, 2020

Please follow PR Title Format:

  • pkg [, pkg2, pkg3]: what's changed

Or if the count of mainly changed packages are more than 3, use

  • *: what's changed

@cfzjywxk
Copy link
Contributor Author

@youjiali1995
a draft fix for the problem

@cfzjywxk cfzjywxk changed the title [draft]fix the resolved txn status cache for pessimistic txn [draft]txn: fix the resolved txn status cache for pessimistic txn Dec 13, 2020
@cfzjywxk cfzjywxk changed the title [draft]txn: fix the resolved txn status cache for pessimistic txn txn: fix the resolved txn status cache for pessimistic txn Dec 14, 2020
@cfzjywxk cfzjywxk added type/bugfix This PR fixes a bug. sig/transaction SIG:Transaction priority/release-blocker This issue blocks a release. Please solve it ASAP. labels Dec 14, 2020
@cfzjywxk
Copy link
Contributor Author

/run-all-tests

@@ -1333,6 +1337,12 @@ func (mvcc *MVCCLevelDB) ResolveLock(startKey, endKey []byte, startTS, commitTS
mvcc.mu.Lock()
defer mvcc.mu.Unlock()

if len(startKey) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temproraly solving some mvcc leveldb problem without which resolve lock will not work for splitted regions.

// - always cache the check txn status result.
// For prewrite locks, their primary keys should ALWAYS be the correct one.
if l.LockType != kvrpcpb.Op_PessimisticLock && status.ttl == 0 {
logutil.Logger(bo.ctx).Info("saved resolved txn status",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it print a lot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this log.

@@ -1833,3 +1835,71 @@ func (s *testPessimisticSuite) TestAmendForUniqueIndex(c *C) {
tk.MustExec("commit")
tk2.MustExec("admin check table t")
}

func (s *testPessimisticSuite) TestResolveStalePessimisticPrimaryLock(c *C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complex. I want to confirm Does it fail with the always cached version?

Copy link
Contributor Author

@cfzjywxk cfzjywxk Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested without cache change, it will fail with error reported from admin check statement. Could be treated as a reproducing case in unit-test, we need also add cases in ticase I think.

Comment on lines 617 to 619
zap.Stringer("status action", status.action),
zap.Uint64("status ttl", status.ttl),
zap.Uint64("status commitTS", status.commitTS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it never changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this log.

// If l.lockType is prewrite lock type:
// - always cache the check txn status result.
// For prewrite locks, their primary keys should ALWAYS be the correct one.
if l.LockType != kvrpcpb.Op_PessimisticLock && status.ttl == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems in the current code, we don't cache when the primary lock is prewrite lock type. Can we move this logic to where saveResolved originally was (L579)? We have more information there. And we can even cache it when the primary lock is committed with a positive ts.

Copy link
Contributor

@youjiali1995 youjiali1995 Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to cache all results, we need to change kvproto.

And we can even cache it when the primary lock is committed with a positive ts.

It seldom happens...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could not get the lock type for the lock we met in the original place, which is needed for the cache check. We could add back the saving logic for commited transaction in the original place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i think it's optional as youjiali1995 says this case seldom happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youjiali1995 @sticnarf
In the previous diff the resolved cache would be checked everytime which is not necessary, I put back the cache fill related logic to its original place, PTAL again, thx

@@ -652,6 +652,10 @@ func (c *twoPhaseCommitter) doActionOnGroupMutations(bo *Backoffer, action twoPh
// by test suites.
secondaryBo := NewBackofferWithVars(context.Background(), CommitMaxBackoff, c.txn.vars)
go func() {
failpoint.Inject("skipCommitSecondaryKeys", func() {
failpoint.Return()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to make this failpoint more general (eg. like the following) so that we can either disable it by return("skip") or delay it by sleep(1000)?

failpoint.Inject("beforeCommitSecondaries", func(v failpoint.Value) {
	if s, ok := v.(string); !ok {
	} else if s == "skip" {
		failpoint.Return()
	}
})

@cfzjywxk cfzjywxk added this to the v4.0.9 milestone Dec 14, 2020
@cfzjywxk
Copy link
Contributor Author

/run-all-tests

@@ -229,6 +229,9 @@ func (lr *LockResolver) BatchResolveLocks(bo *Backoffer, locks []*Lock, loc Regi
if err != nil {
return false, err
}
if l.LockType != kvrpcpb.Op_PessimisticLock && status.ttl == 0 {
lr.saveResolved(l.TxnID, status)
Copy link
Member

@jackysp jackysp Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you combine TxnID and Primary's LockForUpdateTS together as the key of LockResolver.resolved as a cache. I'm worried about the performance impact of removing the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impact now is resolving pessimistic locks whose primary lock is prewrite lock type, will need more check_txn_status calls, we could make the returnd status certain or not in the future to lower the risk.

Copy link
Contributor

@youjiali1995 youjiali1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 14, 2020
@cfzjywxk
Copy link
Contributor Author

/run-unit-test

@sticnarf
Copy link
Contributor

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Dec 14, 2020
ti-srebot
ti-srebot previously approved these changes Dec 14, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Dec 14, 2020
@cfzjywxk cfzjywxk added the require-LGT3 Indicates that the PR requires three LGTM. label Dec 14, 2020
Copy link
Contributor

@youjiali1995 youjiali1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@sticnarf sticnarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@coocood coocood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT2 Indicates that a PR has LGTM 2. label Dec 14, 2020
@ti-srebot ti-srebot added the status/LGT3 The PR has already had 3 LGTM. label Dec 14, 2020
@jebter
Copy link

jebter commented Dec 14, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 14, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/release-blocker This issue blocks a release. Please solve it ASAP. require-LGT3 Indicates that the PR requires three LGTM. sig/transaction SIG:Transaction status/can-merge Indicates a PR has been approved by a committer. status/LGT4 The PR has already had 4 LGTM. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants